Bias and variance in the social structure of gender

نویسندگان

  • Kristen M. Altenburger
  • Johan Ugander
چکیده

The observation that individuals tend to be friends with people who are similar to themselves, commonly known as homophily, is a prominent and well-studied feature of social networks. Many machine learning methods exploit homophily to predict attributes of individuals based on the attributes of their friends. Meanwhile, recent work has shown that gender homophily can be weak or nonexistent in practice, making gender prediction particularly challenging. In this work, we identify another useful structural feature for predicting gender, an overdispersion of gender preferences introduced by individuals who have extreme preferences for a particular gender, regardless of their own gender. We call this property monophily for “love of one,” and jointly characterize the statistical structure of homophily and monophily in social networks in terms of preference bias and preference variance. For prediction, we find that this pattern of extreme gender preferences introduces friend-of-friend correlations, where individuals are similar to their friends-of-friends without necessarily being similar to their friends. We analyze a population of online friendship networks in U.S. colleges and offline friendship networks in U.S. high schools and observe a fundamental difference between the success of prediction methods based on friends, “the company you keep,” compared to methods based on friends-of-friends, “the company you’re kept in.” These findings offer an alternative perspective on attribute prediction in general and gender in particular, complicating the already difficult task of protecting attribute privacy. Homophily is the observed phenomenon in social networks whereby friendships form frequently among similar individuals [29, 34]. Homophily can originate from an individual’s personal preference to become friends with similar others (choice homophily), structural opportunities to interact with similar others (induced homophily), or a combination of both [26]. An important consequence of homophily is that even if an individual does not disclose attribute information about themselves (such as their gender, age, or race), methods for relational learning [37, 22, 31, 45, 3, 48] can often leverage attributes disclosed by that individual’s friends to predict their private attributes. Gender prediction, however, is a difficult relational learning problem, as gender homophily can be weak or non-existent in both online and offline settings [53, 49, 46, 36, 27]. Weak gender homophily motivates us to examine alternative network structures useful for attribute prediction [13]. In this work, we focus on gender prediction and document the presence of individuals in social networks with extreme gender preferences for a particular gender, regardless of their own gender. We call this overdispersion of preferences “monophily” to indicate it as distinct from the preference bias introduced by homophily, and observe that monophily is nearly ubiquitous across the population of online and offline friendship networks that we study. The presence of these individuals with extreme preferences introduces similarity among friends-of-friends or along 2-hop relations. For the practical problem of attribute prediction, being friends with an individual with extreme gender preferences is a strong signal of one’s own gender and is therefore useful for gender prediction. In order to model these empirical observations, as part of this work we also introduce an overdispersed stochastic block model that enables us to separately simulate homophily and monophily in social networks. We show how the 2-hop structural relationship induced by overdispersion (monophily) can exist in the complete absence of any 1-hop bias (homophily), and find that overdispersed friendship preferences can drive successful classification algorithms in settings with weak or even no homophily. Therefore, in networks with weak homophily but strong monophily, your friends-of-friends (“the company you’re kept in”) can then be responsible for disclosing private attribute information, as opposed to your friends (“the company you keep”). These findings extend the importance of privacy policies that protect relational data, while also proposing an intuitive structural property of social networks of independent interest. ∗Department of Management Science & Engineering, Stanford University. Email: [email protected]. †Department of Management Science & Engineering, Stanford University. Email: [email protected]. 1 ar X iv :1 70 5. 04 77 4v 1 [ cs .S I] 1 3 M ay 2 01 7 In the spirit of a solution-oriented science [55], our analysis addresses the practical problem of inferring gender on social networks by revisiting the social theory of homophily and introducing alternative considerations for heterogeneity in friendship preferences. In addition to improving prediction, we also present monophily as an independent structure of interest when studying “gender as a social structure” [42] by explicitly quantifying the variability in gender preferences beyond the bias captured by homophily. Only recently has the role of variability in general and overdispersion in particular been studied on social networks where classic perspectives have prioritized analyzing aggregate patterns of interaction [40]. This work follows other advances in incorporating variance and overdispersion in social data analysis include understanding the consequences of overdispersion when estimating the size of sub-populations [60], documenting variations in the homophily of political ideology [4], assessing gender variation in linguistic patterns [2], and inferring social structure based on indirectly observed data [32]. The paper proceeds by first establishing how we measure the bias (homophily) and excess variance (monophily) of gender preferences. We then examine how relational inference methods for node classification relate to the presence of homophily and/or monophily. While previous models of homophily have shown its statistical significance in network data [58, 47], we highlight that the statistical significance of homophily does not necessarily imply predictive power when the task is to infer private attributes. Following the empirical analysis, we introduce a network model of overdispersed preferences that generalizes the well-studied stochastic block model [21]. Throughout this work we view gender as a binary attribute and aim to measure homophily in a manner that encompasses all sources of preference due to both choice and induced homophily. While we focus on gender, the methods developed in this work contribute a broad statistical toolkit for the general study of variability in social group interactions across a wide range of attributes or traits. We begin by showing how the conventional homophily index can be interpreted as the maximum likelihood estimate of a parameter within a simple generalized linear model. We then extend this model to capture overdispersed preferences using a quasi-likelihood approach, introducing an overdispersed model with additional parameters that concisely measure the overdispersion of gender preferences among females (F ) and males (M), respectively. We propose estimates of these parameters as our measures of monophily among females and males in network data. The homophily index of a graph [7, 10] characterizes the aggregate pattern of individuals’ biases or preferences in forming friendships with people of their own attribute class relative to people from other classes. For a generic attribute class r and assuming there are k = 2 classes, the homophily index with respect to class r is defined as ĥr = ∑ i∈r di,in ∑ i∈r di,in + ∑ i∈r di,out = ∑ i∈r di,in ∑ i∈r di , (1) where di,in denotes node i’s observed in-class degree with similar others, di,out denotes its observed outclass degree with different others, di denotes its observed total degree, and nr will represent the total number of nodes with attribute r such that N = ∑k r=1 nr. For notational simplicity, we use i ∈ r to refer to the set of all nodes with attribute value r. In measuring binary gender homophily (i.e. r = F or r = M), we first illustrate how to measure homophily among females. We assume that each individual i ∈ F in a network forms in-class connections with the other nF individuals at a rate pin,F and out-class ties with the other nM individuals at a rate pout (and similarly for each individual i ∈ M that a connection with males form at a rate pin,M and with females form at a rate pout). We therefore expect for each individual i ∈ F that their class-specific degrees obey the following distributions (permitting self-loops): Di,in|pin,F ∼ Binom(nF , pin,F), (2) Di,out|pout ∼ Binom(nM , pout), (3) Di|pin,F, pout = Di,in|pin,F +Di,out|pout, (4) where Di,in is a random variable describing the in-class degree, Di,out describes the out-class degree, and Di describes the total degree of node i in class F . We explicitly condition these random variables on the parameters pin,F and pout to make clear that these parameters are, for now, fixed and constant. The nodes i ∈ M have the same binomial degree distribution specified by inclass degrees formed among the nM nodes at a rate pin,M and outclass degrees formed among the nF nodes at a rate pout. With only k = 2 classes, for simplicity we use the notation pout in place of e.g. pout,r,s, highlighting that the rates could depend on the specific inand out-classes r and s in the most general directed multi-class case. Note that the random variables in equations (2)–(4) are approximately independent, 2

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Italian Political Communication and Gender Bias: Press Representations of Men/Women Presidents of the Houses of Parliament (1979, 1994, and 2013)

The study considers mass media communication as intertwined with social norms, as assumed by the perspective of social representations. It explores the Italian press communication by focusing on three pairs of men and women politicians with different political orientations and all serving as presidents of the Houses of Parliament in three legislatures. The article concentrates on five newspaper...

متن کامل

P95: Self-Focus Attention and Interpretation Bias in Social Phobia Patients and Compared with Normal Subjects

This study compared focus attention and interpretation biases in social phobia patients and normal subject. In this study with causal- comparative method 100 subjects with social phobia and 100 normal individuals have compared. Subjects were selected through cluster sampling among Tabriz university students. Data scale of social phobia, attention bias and interpretation of the data was performe...

متن کامل

Interaction Between Race and Gender and Effect on Implicit Racial Bias Against Blacks

  Background and aims: <span style="color: #221e1f; font-family: Optima ...

متن کامل

Gender Representation in Interchange (Third Edition) Series: A Social Semiotics Analysis

Gender representation has long been studied in both verbal and visual modes of ELT textbooks. However, regarding the visual mode, research has mainly focused on superficial analyses of how often each gender appears in different roles rather than on how the two genders are represented. The tools proposed in Kress and van Leeuwen&rsquo;s (2006) social semiotics framework, however, permit deep ana...

متن کامل

Evaluating an Instructional Textbook: A Critical Discourse Perspective

A critical discourse analysis (CDA) of English language teaching (ELT) textbooks can provide a theoretical description of existing ideological effects in the texts and a means to link linguistic and social practices. This study, thus, seeks to evaluate Summit 2B (i.e., the advanced book of Top Notch series) with a focus on the representation of male and female social actors. In so doing, this s...

متن کامل

Interpreting Ambiguous Social Situations in Social Anxiety: Application of Computerized Task Measuring Interpretation Bias

Background and Aims: The interpretation bias which is an important factor in the pathology of social anxiety disorder, has been recently considered in therapeutic approaches. Given the importance of interpretation bias in the treatment of social anxiety, and despite the ambiguity in the relationship between social anxiety and interpretation bias, we compared the interpretation bias in individua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1705.04774  شماره 

صفحات  -

تاریخ انتشار 2017